26 research outputs found

    Reconstruction of Phonated Speech from Whispers Using Formant-Derived Plausible Pitch Modulation

    Get PDF
    Whispering is a natural, unphonated, secondary aspect of speech communications for most people. However, it is the primary mechanism of communications for some speakers who have impaired voice production mechanisms, such as partial laryngectomees, as well as for those prescribed voice rest, which often follows surgery or damage to the larynx. Unlike most people, who choose when to whisper and when not to, these speakers may have little choice but to rely on whispers for much of their daily vocal interaction. Even though most speakers will whisper at times, and some speakers can only whisper, the majority of today’s computational speech technology systems assume or require phonated speech. This article considers conversion of whispers into natural-sounding phonated speech as a noninvasive prosthetic aid for people with voice impairments who can only whisper. As a by-product, the technique is also useful for unimpaired speakers who choose to whisper. Speech reconstruction systems can be classified into those requiring training and those that do not. Among the latter, a recent parametric reconstruction framework is explored and then enhanced through a refined estimation of plausible pitch from weighted formant differences. The improved reconstruction framework, with proposed formant-derived artificial pitch modulation, is validated through subjective and objective comparison tests alongside state-of-the-art alternatives

    Reconstruction of natural sounding speech from whispers

    No full text
    This thesis explores reconstruction of natural sounding speech from whispers. As a broad research class, the generation of normally phonated speech from whispers can be useful in several types of application from different scientific fields ranging from communications to biomedical engineering. The primary focus of the thesis and current work is therefore to investigate appropriate solutions and algorithms for regenerating natural phonated speech from whispers. Interestingly, unlike other speech processing fields, many aspects of such reconstruction, in spite of the useful applications, have not yet been resolved by researchers. In particular, the outcome of this research will find at least two immediate applications which have different forms but similar solutions: a) reconstructing natural speech for laryngectomy patients, b) restoring natural pitched speech in a cell phone/telephone communication when one party talks in a whispering mode for privacy or security reasons. This thesis presents a solution for the conversion of whispers to fully-phonated speech through the modification of the CELP codec. We also present a novel method for spectral enhancement and formant smoothing during the reconstruction process, using a probability mass-density function to identify reliable formant trajectories in whispers, and apply spectral modifications accordingly. The method relies upon the observation that, whilst the pitch generation mechanism of patients with larynx damage is typically unusable, the remaining components of the speech production apparatus may be largely unaffected. The approach outlined here allows patients to regain their ability to speak (simply by whispering into an external prosthesis), yielding a more natural sounding voice than alternative solutions. Since whispered speech can be identified as the core input of the system, the acoustic features of whispers also need to be considered. Despite the everyday nature of whispering, and its undoubted usefulness in vocal communications, whispers have received relatively little research effort to date, apart from some studies analysing the main whispered vowels and some quite general estimations of whispered speech characteristics. In particular, a classic vowel space determination has been lacking for whispers. For voiced speech, this type of information has played an important role in the development and testing of recognition and processing theories over the past few decades, and can be expected to be equally useful for whisper-mode communications and recognition systems. This thesis also aims to redress the shortfall by presenting a vowel formant space for whispered speech, and comparing the results with corresponding phonated samples.DOCTOR OF PHILOSOPHY (SCE

    Speech Recognition for Smart Homes

    No full text

    Speech recognition engine adaptions for smart home dialogues

    No full text
    This paper considers the needs of speech recognition-based dialogues for smart homes, and proposes a structure to allow effective speech recognition in such circumstances. A smart home system based around the Freevo home theatre platform, and the Sphinx2 speech recognition engine has been designed to implement the features and grammar optimization described. In addition, the particular requirements of complexity and size minimisation for embedded systems are discussed. In practical terms we propose a method of continuously variable vocabulary size to maintain required speech recognition accuracy in a smart home context

    Talking Ultrasound

    No full text
    Near real-time speech regeneration for voice-loss patients using ultrasonic speech for phoneme detection and classifi cation is under development by researchers at Nanyang Technological University in Singapore. Their research is important for a range of applications including: prosthesis for voice-loss patients; improving the performance of artifi cial speech recognition systems, especially in the presence of audible acoustic noise; and for the more private use of mobile telephones in public

    Voiced speech from whispers for post-laryngectomised patients

    No full text
    Patients who suffer larynx and voice box deficiencies are typically unable to speak anything more than hoarse whispers without the aid of voice prostheses or rehabilitation techniques such as oesophageal speech. Speech therapists and researchers working in this field have, for many years, pursued the goal of rehabilitation of such patients so as to return to them the ability to speak in a natural sounding voice. Typically due to removal of, or damage to, the voice box in a surgical operation such as laryngectomy, the pitch generation mechanism within these patients voice production systems, is lacking. Without a source of excitation for voiced speech, only hoarse, whisper like and sometimes not easily perceptible sounds can be produced. This speech is obviously different to that from normal speakers, and will have lost many of the distinctive characteristics of the original speech. However, these patients typically retain the ability to whisper in a similar way to normal speakers

    Analysis-by-synthesis method for whisper-speech reconstruction

    No full text
    In the following paper, a method for the real-time conversion of whispers to normal phonated speech through a code excited linear prediction analysis-by-synthesis codec is discussed. This approach uses a template of a speakerpsilas normal phonated speech for extraction of excitation parameters such as pitch and gain, and then injects these estimated excitations into whispered signal to synthesize normal-sounding speech through the CELP codec. Furthermore, since restoring pitch to whispered speech requires some considerations of quality and accuracy, spectral enhancements are required in terms of formant shifting (LSPs modification) and pitch injection based on voiced/unvoiced decision. Spectral shifting is accomplished through line-spectral pair adjustment. Implementing such methods by using the popular CELP codec allows integration of the technique with any modern speech applications and devices. Subjective testing results are presented to determine the effectiveness of the technique

    Regeneration of speech in voice-loss patients

    No full text
    This paper considers regeneration of natural sounding speech from whisper-speech, produced by patients with vocal tract lesions affecting the glottis. Such reconstruction is important for both total and partial laryngectomy patients to improve on the monotonous robotized sound typical of electrolarynx devices. Reconstruction of speech from whispers has been demonstrated previously, however the resulting speech does not exhibit particularly high intelligibility, and more importantly, sounds un-natural. It is the conjecture of the authors that limited pitch variations in the reconstructed speech contributes most to that lack of naturalness. In this paper, a method for pitch contour variation in reconstructed speech is presented. This method extracts voice factors which are important to ‘naturalness’ from the whispered signal and applies these to the reconstructed speech. The method is based upon our previous published work which implemented an analysis-by-synthesis approach to voice reconstruction using a modified CELP codec
    corecore